RBCN: Rectified Binary Convolutional Networks with Generative Adversarial Learning 61
TABLE 3.2
With different λ, the accuracy of PCNN-22
and PCNN-40 based on WRN-22 and
WRN-40, respectively, on CIFAR10 dataset.
Model
λ
1e −3
1e −4
1e −5
0
PCNN-22
91.92
92.79
92.24
91.52
PCNN-40
92.85
93.78
93.65
92.84
Despite the progress made in 1-bit quantization and network pruning, few works have
combined the two in a unified framework to reinforce each other. It is necessary to introduce
pruning techniques into 1-bit CNNs since not all filters and kernels are equally important
or worth quantizing in the same way. One potential solution is to prune the network and
perform a 1-bit quantization over the remaining weights to produce a more compressed
network. However, this solution does not consider the difference between binarized and full
precision parameters during pruning. Therefore, a promising alternative is to prune the
quantized network. However, designing a unified framework to combine quantization and
pruning is still an open question.
To address these issues, we introduce a rectified binary convolutional network
(RBCN) [148] to train a BNN, in which a novel learning architecture is presented in a
GAN framework. Our motivation is based on the fact that GANs can match two data
distributions (the full-precision and 1-bit networks). This can also be viewed as distill-
ing/exploiting the full precision model to benefit its 1-bit counterpart. For training RBCN,
the primary process for binarization is illustrated in Fig. 6.10, where the full-precision model
and the 1-bit model (generator) provide “real” and “fake” feature maps to the discrimina-
FIGURE 3.18
This figure shows the framework for integrating the Rectified Binary Convolutional Network
(RBCN) with Generative Adversarial Network (GAN) learning. The full precision model
provides “real” feature maps, while the 1-bit model (as a generator) provides “fake” feature
maps to discriminators trying to distinguish “real” from “fake.” Meanwhile, the generator
tries to make the discriminators work improperly. When this process is repeated, both
the full-precision feature maps and kernels (across all convolutional layers) are sufficiently
employed to enhance the capacity of the 1-bit model. Note that (1) the full precision model
is used only in learning but not in inference; (2) after training, the full precision learned
filters W are discarded, and only the binarized filters ˆW and the shared learnable matrices
C are kept in RBCN for the calculation of the feature maps in inference.